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ABSTRACT 

This study, based on the Rasch model, used R. H. 
Smith's (1986) classification of measurement disturbances to assess 
the Rasch model approach to error control and statistical prediction. 
Partitioning the error component into a person component, an 
item-person interaction component, and a random unexplained error 
component has the net effect of reducing the error variance and 
improving statistical prediction and classification efficiency. The 
specific goal of this study is to determine the possibility of using 
the Item and Person Analysis Rasch Model (IPARH) to correct a 
person's score to eliminate person disturbances. The Unite 1 States 
Air Force Placement and Validation Examination was administered to 
1,200 freshmen entering the Air Force Academy in Colorado Springs; 
the French language portion of the test was used. Test data were 
analyzed using IPARK to identify individual response disturbances and 
to estimate each person's true score. IPARM can assess item 
disturbances as well as person disturbances, but this study focused 

the latter. Essentially, IPARM uses a person's total score to 
e^^tablish statistical expectancies for each item the individual 
at empted. Test result correla^-^ons with French grades and class 
rankings indicated that the iPAKM-based corrections are useful for 
improvement of prediction and classification. (TJH) 



***************************** *****'Jt**ie*iciciciclclciieicicititicieieitJcicieieieieieieieiei:icicicicicic 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 



U.S. OEPAnTM£N<' OF EDUCATION 
Office of Educational Research and Improvement 

UCATIONAL RESOURCES ^^jF0RMATl0^4 
y CENTER (ERIC) 

X^tMt document has been reproduced at 
received from the person or Of0ari<i*ti0n 
Ofiginating it 

□ Minor chanoes have t^tn made to improve 
reproduction Quality 

• Pointsof viewor opiruons stated tn ihisdocih 
ment do not necessarily represent ofLcial 
OERi position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)/* 



IMPROVING PKEDICTION BY CORRECTING TEST SCORES FOR PERSON DISTURBANCES 

USING THE RASCH MODEL 



Philip Jean-Louis Westfall and Ayres G. D' Costa 
U.S» Air Force Academy The Ohio State University 



BEST COPY AVAILABLE 



Improving prediction by correcting test scores for person disturbances 

using the Rasch Model 

Philip Jean-Louis Westfall and Ayres G. Costa 
U.S. Air Force Academy The Ohio State University 



Baclcqroxmd 

Several efforts have been made to refine the classical 
analysis of an observed score as the sum of a true score 
component for the individual and a random error component. Pike 
(1978) identified four components: a true score, a primary test- 
specific component, a secondary test-specific component and the 
usual random error component. This study is based upon the Rasch 
Model (Wright and Stone, 1979) and utilized Smith's (1986) 
classification of measurement disturbances into two types: those 
associated with the person, and those related to the item-person 
interaction. 

The focus of this study is on Smith's approach to error 
control, rather than on Pike's efforts at true score analysis. 
Our interest is statistical prediction or classification, rather 
than an analysis of the sources of intellectual ability. We will 
ignore the dimensionality of the true s*::ore and assume it to be 
unidimensional. Partitioning the error component into a person 
component, an item-person interaction component, and a random 
unexplained error component, has the net effect of reducing the 
error variance and improving statistical prediction/ 
classification efficiency. 

Smith identified eight person-related disturbances: start-up 
test anxiety, excessive cautiousness or plodding, copying/ 
cheating, external distractions, illness, systematic guessing, 
random guessing, and excessive carelessness or sloppiness. He 
also identified five disturbances due to item-person interaction: 
guessing when item is too difficult, over-confidence, item over- 
or under-learned, item type-style bias, and item information 
bias. 

The research ^tion of this study is as follows: What if 
it were possible using the Rasch model, specifically the IPARM 
(Item and Person Analysis using the Rasch Model) program, to 
correct a person's score to eliminate these disturbances? Would 
the resulting score, presumably a better representation of the 
individual's ability, improve prediction of future performance? 
IPARM has been developed by Smith, Wright and Green (1987) . 



Study Design and Rationale 



The USAF Placement and Validation Examination (PLAVAL) was 
administered to the entire group of 1200 freshmen entering the 
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Academy. Cadets are clcissified into three levels of additional 
French language training (low, intermediate, advanced) based on 
PLAVAL scores. Kisclassif ication has serious cost implications 
because of financial, personal, and opportunistic effects. 

PLAVAL data were analyzed using IPASM so as to identify 
individual response disturbances and estimate a person's time 
score. IPARM can look at item disturbances as well as person 
disturbances, but this study focussed only on the latter. 

Essentially, IPARM uses a parson's total score to establish 
statistical expectancies for each item the individual attempted. 
If a low-ability person correctly answered b ery difficult 
question, the standardized residual (very u ^.i like the Sato 
Caution Index) becomes large. Whereas the Sato Caution Index 
(Harnish, 1983) is reported as is, IPARM notes that these 
residuals approximate a chi-square distribution and can provide a 
test of goodness of fit. Fit statistics can be provided for one 
item, a group of items, or an entire test. It is the analyses of 
groups of items that lead to discovering the nature of the misfit 
when present, and applying an appropriate correction. Test items 
can be divided into subgroups representing difficulty, order of 
presentation, item styles, or content. Each subgroup is analyzed 
as a test independent of other subgroups, with fit statistics 
characterizing an individual's expected response pattern. 
Estimates of ability are then determined for each sxibgroup. 
Items contributing most to an individual's misfit statistics are 
eliminated. 

The research design compared the success of the 
classification of the 1200 USAF cadets into the three French 
class levels using three methods of prediction: based on the 
usual raw-score, a regular Rasch-model-based true score, and an 
IPARM-based corrected true score. First, success was measured in 
terms of correlation with French grades earned by each cadet in 
the freshman year. Second, success was measured in terms of the 
French class level classified into. 



Table 1 presents the correlation coefficients (Pearson) for 
the three methods relative to the three French classes. 



Results; 



Low 



Inter 



Advanced 



Raw Score 
Rascb Only 
IPARM 



0.48 
0.52 
0.63 



0.46 
0.51 

0.63 



0.44 
0.45 
0.48 
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Table 2 below presents the percent correctly classified, 
using discriminant analysis techniques, for the three 
French classes « 



Low 



Inter 



Advanced 



Raw Score 
Rasch Only 
IPARM 



56.9 
61.5 
63.1 



52.4 
52.4 
66.7 



51.6 
45.3 
53.1 



Both sets cf analysis indicate the superiority of the IPARM 
technique over using Rasch method by itself, and clearly over the 
usual Raw Score method. The advanced group appears to be the 
luost hazardous; to predict to. This is understandable given 
statistical regression. 



Implications 

This study indicates the utility of using the IPARM-based 
corrections at least for the improvement of prediction and 
classification. It appears that person-related disturbances can 
impinge upon the estimate of true score provided by tests, and 
that this error in tuim can adversely affect prediction and 
classification. The use of IPARM techniq?aes has educational 
utility and needs further exploration as a diagnostic/remediation 
device. 

Our study did not explore whether the disturbances we 
corrected for were real, in the sense that the individual was 
personally aware of them. We were unable to conduct individual 
verification of the disturbances identified by IPARM. This is a 
serious limitation of this study and one that we had no control 
over because of restrictions imposed by USAF policy. We 
recommend that IPARM be used to provide individual feedback and 
assistance with remediation of test-taking "disturbances." 
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